Automated detection of Alzheimer’s disease: a multi-modal approach with 3D MRI and amyloid PET

Recent advances in deep learning and imaging technologies have revolutionized automated medical image analysis, especially in diagnosing Alzheimer’s disease through neuroimaging. Despite the availability of various imaging modalities for the same patient, the development of multi-modal models leveraging these modalities remains underexplored. This paper addresses this gap by proposing and evaluating classification models using 2D and 3D MRI images and amyloid PET scans in uni-modal and multi-modal frameworks. Our findings demonstrate that models using volumetric data learn more effective representations than those using only 2D images. Furthermore, integrating multiple modalities enhances model performance over single-modality approaches significantly. We achieved state-of-the-art performance on the OASIS-3 cohort. Additionally, explainability analyses with Grad-CAM indicate that our model focuses on crucial AD-related regions for its predictions, underscoring its potential to aid in understanding the disease’s causes.


Related work
Recent advancements in deep learning have revolutionized the diagnosis and detection of diseases, including Alzheimer's, by offering a significant advantage over traditional machine learning algorithms.This advantage primarily lies in deep learning's capacity to automatically extract features from data, eliminating the complexity and potential bias associated with manual feature extraction.In neuroimaging analysis for AD diagnosis, a substantial focus has been leveraging MRI data.
Among noteworthy contributions, Altay et al. 14 recently proposed two different attention-based models and compared their performance with a 3D CNN baseline.The first attention model is a recurrent attention network, which extracts glimpses from stacked MRI images and feeds them into recurrent attention units to obtain a classification.The second model is a modified and repurposed Transformer, which first extracts the features of an image sequence from a pre-trained network and then feeds these features to a Transformer network to classify the sequence.The Transformer model outperformed the other approaches by achieving ∼ 91% accuracy in preclinical AD detection on OASIS-3 data.Even more recently, Helaly et al. 15 proposed two methods to classify MRI images and detect AD.The first method uses simple CNN architectures that deal with 2D and 3D structural brain scans from the ADNI dataset based on 2D and 3D convolutions.The second method applies the transfer learning principle to exploit pre-trained models for medical image classification.The latter strategy achieved a high accuracy of approximately 97% for multi-class AD stage classification.
While structural MRI has been the cornerstone of AD research, recent studies have also highlighted the significant role of amyloid PET in enhancing diagnostic capabilities.Amyloid PET offers a complementary perspective by enabling the detection and quantification of β-amyloid deposits in the cerebral cortex, a hallmark of AD.This technique uses contrast materials to visualize and measure the presence of β-amyloid and other substances, providing a more comprehensive understanding of the disease's neuropathology.
In the pursuit of automated diagnosis, De Vries et al. 16 demonstrated the utility of amyloid PET by training a 2D CNN model on sagittal FDG-PET images from the SCIENCe cohort and ADNI.Their model achieved a remarkable 95% accuracy in classifying A β positive and negative scans among patients with cognitive decline.Reith et al. 17 also pursued a similar goal.However, neither study specifically focused on distinguishing between cognitively impaired patients and healthy controls.
Addressing this gap, Tufail et al. 18 investigated the effect of data augmentation techniques on CNN performance for early AD diagnosis using 3D PET scans from the ADNI dataset.Their research found that while the best-reported accuracy reached approximately 86%, combining all augmentation techniques did not yield optimal results.Our preliminary work achieved an 83% accuracy rate using 3D amyloid PET scans from the OASIS-3 dataset 19 .Although this did not represent a state-of-the-art outcome, it provided a promising basis for exploring multi-modal diagnostic strategies.
The exploration of multi-modal neuroimaging for the automatic detection of AD remains relatively underexplored, with few studies aiming to maximize performance by leveraging the combined strengths of different imaging modalities.Zhou et al. 20 introduced a novel three-stage deep feature learning and fusion framework, achieving an impressive 89% accuracy in differentiating healthy individuals from those with AD using combined MRI and PET data from the ADNI dataset.However, this study did not explore the potential benefits of more sophisticated convolutional or volumetric approaches.
Lu et al. 21advanced the field with a multi-modal and multi-scale deep learning framework utilizing MRI and FDG-PET data, reaching an accuracy of approximately 85%.Despite its promise, this method's computational intensity and disregard for the intrinsic three-dimensionality of neuroimaging data, due to its reliance on segmented image patches, present significant limitations.Liu et al. 22 explored a cascade framework that employs 3D CNNs for feature extraction from local image patches, followed by a feature ensemble through a 2D CNN,

Materials
In this study, we employed the OASIS-3 dataset 10 , the most recent iteration of the OASIS series, increasingly recognized as a benchmark for diverse research goals within the scientific community [30][31][32][33] .The dataset is publicly available (https:// www.oasis-brains.org), ensuring broad accessibility for research purposes, with data sharing facilitated by participant consent.It encompasses MRI and PET images from 1098 participants, ages 42 to 95, and includes cognitively normal adults (605) and individuals at various stages of cognitive decline (493).
Our analysis specifically focused on amyloid PET and T1-weighted MRI scans.When feasible, the data collection protocol involved participants undergoing simultaneous PET and MRI scans within a 70-min session, beginning at the tracer injection time, to ensure synchronized acquisition of dynamic PET data alongside standard MRI.In instances where simultaneous imaging was not possible, PET and MRI scans were conducted in separate sessions, typically within a six-month interval, to maintain the relevance of the data across both modalities 10 .
From the 1607 PET sessions in the dataset, we selected 1352 sessions that had been processed using the PET Unified Pipeline, standardizing the spatial resolution to 8mm to reduce variability across scanners.The OASIS-3 dataset includes various types of PET scans, namely PiB-PET, AV45-PET, and FDG-PET, with varying availability by session.For our purposes, we exclusively analyzed amyloid PET scans (PiB-PET and AV45-PET), representing about 93% of the PET scan subset.
To enhance the accuracy of patient labeling and minimize the incidence of false negatives and false positives within our dataset, we employed a targeted algorithm for label correction.This approach was designed to identify and adjust labels based on the temporal sequence of patient diagnoses, ensuring greater consistency and reliability in our data.The algorithm operates under the following criteria: • For a patient initially labeled as negative, if at least one of the two preceding labels (calculated by days since initial entry) was positive, and at least one of the following two labels was also positive, the patient's label was adjusted to positive.• Conversely, for a patient initially labeled as positive, if at least one of the two labels immediately before was negative, and at least one of the following two labels was negative, the patient's label was revised to negative.
Figure 1 illustrates this post-processing approach, showing how labels were adjusted to more accurately reflect the progression of disease status over time based on the sequence and consistency of diagnostic labels.Our analysis labeled each PET scan according to the most recent diagnosis available.Given that not every scan session was directly accompanied by a psychiatric or neurological evaluation, we linked each scan with the temporally closest diagnostic test, regardless of whether it occurred before or after the scan.This process yielded a dataset comprising 1217 scans labeled as negative and 135 as positive.Among the positive scans, 20 were from subsequent visits of patients who maintained a positive diagnosis.Excluding these repetitive positive scans reduced the total count of unique positive cases to 115.We employed a two-level approach to address the significant imbalance between the negative and positive classes: random under-sampling of the negative class www.nature.com/scientificreports/and data augmentation techniques applied to the positive class.The under-sampling process decreased the number of negative scans to 148, enhancing class balance.Further balancing was achieved by applying a series of rotation and mirroring operations to randomly selected positive scans, a method inspired by recent studies on this dataset 14 .
The MRI component of our study mirrored the PET scan analysis.Each MRI was paired with the corresponding PET Unified Pipeline scan, ensuring consistency across modalities.However, when multiple T1-weighted MRI scans were available for a single session, we selected only the most recent scan for inclusion in our dataset.

Classification models
Our models leverage a Convolutional Neural Network as the foundational architecture, tailored to accommodate the specific characteristics of the input scans-either in terms of their dimensionality or modality.The core architecture consists of four convolutional layers, each succeeded by a max pooling layer and batch normalization, cumulatively forming the feature extraction block of the network.The convolutional layers progressively increase in filter count, enhancing the network's ability to capture complex features.Following the feature extraction phase, the architecture transitions into the classification block, which is composed of fully connected layers.This block processes the flattened output from the preceding block, incorporating a dropout layer (set at a 30% rate) to mitigate overfitting.The final layer, equipped with a sigmoid activation function, outputs a probability indicating the likelihood of the positive class, with values ranging from 0 to 1.
The rationale behind selecting CNNs as the foundation of our models is rooted in their proven capability and efficiency in handling image data, particularly in medical imaging and diagnostics.CNNs excel in automatically detecting critical features without the need for manual extraction, making them ideal for analyzing complex neuroimaging data.
Preparation of the data for training necessitated several preprocessing steps.Given the 4D nature of PET scans (3D images acquired over time), we condensed the volumetric data into a single "average" image, thereby simplifying it to a 3D format comparable to MRI scans.Subsequent steps involved isolating the brain within the scans.This was achieved by applying a Gaussian blur (kernel size of 13 × 13 and σ = 150 ) to the images to enhance segmentation via Otsu's threshold method, followed by bounding box identification based on the images' high-intensity regions.To standardize the input data, both PET and MRI images were resized to 128 × 128 voxels, the smallest dimension present in the dataset, to prevent upscaling artifacts.Owing to computational limits, incorporating the entire 3D volume for training was impractical.Thus, we selectively used the central 50 slices of each scan, ensuring the inclusion of significant brain regions without necessitating excessive computational resources.In models designed for 2D input, these slices were treated as independent images.
Our study systematically explored several variations of the proposed CNN architecture, tailored to different imaging modalities and dimensionalities, as outlined below:  www.nature.com/scientificreports/ • 3D MRI + PET (fusion): A dual-branch, multi-modal model that processes MRI and PET scans in parallel, combining features before classification.Each branch functions as a "twin" network, allowing for the distinct extraction and fusion of features from both modalities.Unlike the input model, this fusion approach requires both MRI and PET scans for prediction, offering a comprehensive but less flexible solution.
For all models, binary cross-entropy was used as the loss function, defined as in Eq. ( 1): where y represents the true label, and ŷ is the predicted probability generated by the model.Figure 2 shows the architecture of the fusion model; the others mirror this architecture with the appropriate modifications to make them work in a single mode or with 2D data.

Explanation of the classification outcomes
Our models, inherently "black-box" in nature, do not offer interpretable or explainable outcomes without further analysis.To bridge this gap, we applied Gradient-weighted Class Activation Mapping (Grad-CAM), a technique proposed by Selvaraju et al. 13 , designed to elucidate the decision-making process of CNNs.Grad-CAM generates coarse localization maps that visually emphasize the regions within the original image most influential in predicting a specific concept.
The essence of Grad-CAM lies in its ability to create class-discriminative localization maps, L c GradCAM ∈ R u×v , by leveraging the gradients of the class score, ŷc , for any given class c, against the convolutional layer's feature maps, A k .This process involves global average pooling of the gradients to ascertain the neuron importance weights, α c k , which serve as a condensed representation of how each feature map contributes to the prediction of the target class.The computation of these weights is encapsulated by Eq. ( 2): In the formula, i and j refer to the location of the (i, j)-th pixel, while Z represents the total pixel count.A weighted combination of forwarding activation maps is then applied to obtain a coarse heatmap having the same size as the chosen convolutional layer.ReLU is then applied to obtain those features that have a positive influence on the class of interest, resulting in Eq. ( 3): (1) www.nature.com/scientificreports/By overlaying the Grad-CAM heatmaps onto the AAL2 atlas-a comprehensive anatomical atlas featuring 120 cortical and sub-cortical brain regions 34 -we were able to pinpoint the brain areas most significantly implicated in the model's classifications.Regions with mean Grad-CAM values in the top 90th percentile were identified as critical for the model's decision-making process, highlighting their potential relevance to AD pathology.

Experiments Experimental setting
Our experimental framework was built on Google Colab, using an NVIDIA Tesla P100 GPU to ensure efficient computation.The architecture of our models was developed with the TensorFlow library, a choice inspired by its widespread acceptance and robust capabilities.The source code for our implementation is openly accessible (https:// github.com/ monta narog razia no/ Multi modal-appro ach-for-AD).The dataset underwent a patient-wise random split, allocating 80% to the training set and the remaining 20% to the test set.To fine-tune the hyperparameters, we employed an internal 10-fold stratified cross-validation approach.This technique ensures each fold maintains a balanced representation of class labels, mirroring the overall dataset distribution.Optimization across all models was achieved through the Adam optimizer, with a dynamic learning rate and an exponential decay of 0.96.We initiated the learning rate at 5 × 10 −5 .The training process was designed to run for up to 10,000 epochs, incorporating an early stopping mechanism activated after 35 epochs without improvement in validation accuracy.This criterion ensured the retention of the optimal model weights that yielded the highest validation accuracy.
Evaluation of model performance was conducted using established diagnostic metrics: • Accuracy: Measures the overall proportion of true positive and true negative predictions across the dataset, providing a straightforward indicator of model performance.
• Sensitivity: Also known as the true positive rate, this metric quantifies the model's ability to identify positive cases correctly.High sensitivity implies few false negatives, making it critical for conditions where missing a positive case has severe implications.• Specificity: Reflects the model's proficiency in identifying negative cases accurately, with high specificity indicating minimal false positives.This is crucial in avoiding unnecessary concern or treatment for healthy individuals.
• AUC : This represents the model's capability to distinguish between classes across varying thresholds.A higher AUC value signifies better overall performance, encapsulating the trade-off between sensitivity and specificity.

Classification results
Our investigation observed several critical insights from the classification results obtained using various model configurations on the OASIS-3 dataset (Table 1).Notably, models leveraging three-dimensional inputs outperformed their two-dimensional counterparts, likely due to the additional spatial information in 3D scans that facilitates learning more complex features.Furthermore, our analysis revealed that MRI scans in 2D or 3D formats consistently provided superior results to amyloid PET scans, with an accuracy difference of approximately 8-10%.While PET scans alone show promise, our findings reinforce that MRI scans are inherently more informative for our study.Interestingly, transfer learning between modalities did not yield improvements over uni-modal or multi-modal approaches.This suggests that features specific to one scan type might not directly apply to another, underscoring the complexity of cross-modality feature applicability.Excluding the robust performance of the 3D MRI model, multi-modal strategies, especially the fusion model, exceed uni-modal and transfer learning approaches.With the fusion model reaching an impressive 95% accuracy, our analysis validates the superiority of integrating multiple neuroimaging modalities.This suggests that MRI and PET scans www.nature.com/scientificreports/fulfill complementary roles in disease prediction, with MRI proving crucial in uni-modal scenarios.The fusion model's remarkable sensitivity is particularly advantageous for disease detection, where minimizing false negatives is critical, even though the 3D MRI model showed higher specificity.Given the priority of detecting disease presence accurately, the fusion model's high sensitivity is deemed more beneficial for clinical applications, as it ensures that fewer cases are overlooked at the initial screening stage.Notably, the combined input model, which leverages both PET and MRI data, offers a balanced solution, bridging the gap between PET-specific and MRIspecific models with accuracies of approximately 81 and 91%, respectively.This model's versatility in handling PET or MRI data makes it valuable, particularly in diverse clinical settings where scan availability may vary.Moreover, the prioritization of sensitivity in our evaluation reflects a strategic choice to favor detecting as many true positive cases as possible, recognizing the critical nature of early disease identification and intervention.
In analyzing the performance of our best-performing model, the fusion approach, it is noteworthy that all false negatives were from individuals with a clinical dementia rating score of 0.5.This score places these patients in a "gray zone", where the detection of AD presents a more significant challenge due to the subtlety of the symptoms and the early stage of cognitive decline.Despite this inherent difficulty, it is remarkable that our model was able to correctly classify 80% of all patients within this ambiguous category in the test set.This high rate of accurate classification underscores the fusion model's effectiveness, particularly in steering the complexities of early-stage AD diagnosis, where traditional methods may fail.
In our literature review, as summarized in the "Related Work" section, our findings demonstrate a performance that aligns well with existing research utilizing the OASIS and ADNI datasets for AD detection through deep learning techniques.Notably, our approach consistently matches or surpasses the results of significant recent studies.Our results' alignment with state-of-the-art performances further validates the multi-modal strategy's value in enhancing AD diagnostic processes, affirming the importance of incorporating diverse neuroimaging data for more accurate classification.

Explainability results
Our study used Grad-CAM to identify the brain regions most instrumental for classification by our models, analyzing both MRI and PET scans across positive and negative AD groups.The findings in Table 2 reveal significant insights into the neuroanatomical basis of AD detection.
For MRI scans, a notable consistency exists in the regions highlighted across both groups, with the Temporal Lobe emerging as the most critical area for classification.This observation aligns with established research indicating Temporal Lobe atrophy as a critical predictor of AD [35][36][37] and other dementia forms 37,38 , particularly emphasizing the significance of the Medial Temporal Lobe [39][40][41][42][43][44] .Interestingly, the Middle Cingulate Gyrus and the Left Inferior Parietal Gyrus were identified as unique indicators for the negative and positive groups [45][46][47] .Additionally, the Precentral Gyrus and Precuneus, both associated with AD pathology, were identified as relevant, supporting their roles as early biomarkers of AD 48,49 .
PET scans revealed a substantial overlap in significant regions between the positive and negative groups, similar to MRI findings.Apart from the Middle Temporal Gyrus, Precentral Gyrus, and Precuneus, the Frontal Gyrus (encompassing the superior, middle, and inferior triangular parts) was markedly significant in AD detection [50][51][52] .This is consistent with literature documenting the Frontal Lobe's involvement in AD through alterations and hypometabolism, underscoring its importance in early-stage AD and frontotemporal dementia [53][54][55][56][57] .
The overlap of significant regions across both groups, for MRI and PET, suggests that our model consistently focuses on the same areas for discrimination, albeit with some variation between modalities.This variation underscores the complementary nature of MRI and PET information, supporting our hypothesis that leveraging both modalities can enhance diagnostic accuracy.These insights are further illustrated in Fig. 3, showcasing the Grad-CAM heatmaps and the pivotal role of identified regions in AD prediction.
Overall, our findings substantiate the value of multi-modal imaging in AD detection, highlighting nonredundant, complementary information provided by MRI and PET scans.This approach aligns with existing neuropathological understanding and opens avenues for more subtle and effective diagnostic strategies.www.nature.com/scientificreports/

Conclusion
Our study explored the development of a multi-modal diagnostic model for AD, leveraging both 3D MRI and amyloid PET imaging.Our findings affirm the hypothesis that these modalities offer distinct yet complementary insights, enhancing the construction of predictive models for AD.The experimental outcomes, both quantitative and qualitative, demonstrate that our proposed approaches not only align with but also potentially surpass current state-of-the-art methods, identifying key brain regions associated with AD in concordance with contemporary research.However, our study is not without its limitations.Due to computational constraints, we selected only 50 slices from the axial plane for both PET and MRI analyses, which, while practical, may not fully capture the comprehensive spatial information available across all three anatomical planes (sagittal, coronal, and axial).This selection process potentially limits the depth of our predictions and interpretations.Furthermore, our method of averaging frames over time, though necessary for the manageability of PET scans, results in a loss of temporal resolution that could otherwise offer additional diagnostic insights.Future research could benefit from integrating full volumetric (4D) data to preserve temporal dynamics despite the increased computational demands or employing models that pinpoint the most diagnostically relevant time frames.Furthermore, our findings underscore that although PET scans exhibit promise, MRI scans are inherently more aligned with our study's goals, providing richer diagnostic information.While our research did not directly target the classification of patients based on amyloid positivity or negativity, pursuing an in-depth analysis of amyloid PET's capacity to differentiate amyloid-positive from -negative individuals promises to shed light on its utility, particularly for devising targeted interventions for neurodegenerative ailments.
Lastly, considering the pivotal involvement of the Hippocampus and Medial Temporal Lobe in AD pathology, comparing our models' sensitivity and specificity with metrics derived from MRI measurements of these areas' volumes could significantly enhance our insight into the comparative diagnostic value of diverse neuroimaging biomarkers.Our current focus has been on deploying Convolutional Neural Networks to analyze structural MRI and amyloid PET data, leveraging deep learning's robust pattern recognition for a holistic evaluation of brain imaging data.Anticipating future research directions, integrating a comparative volumetric analysis represents a promising expansion of our work, aiming to combine the strengths of both volumetric and pattern recognition methodologies for a richer diagnostic toolkit. https://doi.org/10.1038/s41598-024-56001-9

Figure 1 .
Figure1.This figure illustrates the adjustment in label distribution for subject "OAS30040" before and after the post-processing step designed to mitigate false positives and false negatives.The horizontal axis represents the timeline of the subject's participation in the study, measured in days since their initial entry.

Figure 2 .
Figure 2. Architecture of the fusion model.

Figure 3 .
Figure 3. Three-dimensional visualizations created from multiple axial brain slices, showcasing Grad-CAM heatmaps for the positive class (a) MRI, (b) PET.

Table 1 .
The classification performance of the different models, as evaluated on the test set, is presented in ascending order based on accuracy.All metrics are expressed as percentages.

Table 2 .
Key brain regions identified by MRI and PET for both positive and negative AD groups.